Portfolio Project: Anomay Detection

ML & Cybersecutity Project

Anomaly Detection in IoT Network Traffic

Developed an anomaly detection pipeline on the CTU-IoT Malware dataset to identify malicious vs. benign network connections. This project demonstrates applying unsupervised learning techniques to cybersecurity data, including feature engineering, preprocessing, and model evaluation.

Key Contributions:

Exploratory Data Analysis: Inspected 23 features across 23k+ network traffic entries. Explored data types, distributions, and correlations.
Data Preprocessing:
- Handled missing values and categorical encodings.
- Converted IPs and ports into categorical features.
- Engineered new features (e.g., rolling connection counts over time windows).
Pipeline Construction: Built preprocessing pipelines with ColumnTransformer and Pipeline to standardize numeric features and encode categorical ones.
Anomaly Detection Models: Experimented with clustering and unsupervised methods to detect unusual patterns that may indicate malware activity.
Cybersecurity Application: Interpreted anomalies in the context of malicious traffic detection.

Skills Demonstrated:

Machine Learning (unsupervised learning, clustering, anomaly detection)
Feature engineering for network traffic data
Python ML stack: pandas, scikit-learn, numpy, matplotlib, seaborn
Cybersecurity analytics (malware/attack traffic detection)
Model pipeline design & evaluation

Links:

Rendered Notebook PDF: ML unsupervised learning – anomaly detection pipeline
CTU-IoT Malware Kaggle Dataset: Malware Detection in Network Traffic Data
GitHub

Other Projects

MITRE ATT&CK Techniques

Machine Learning Pipeline to predict MITRE Att&cks based on categorical features from a cybersecurity dataset

Discover

Social Media Engagement Prediction

Machine Learning pipeline to predict engagement on social media posts using structured data

Discover